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LOCAL PARTIAL LIKELIHOOD ESTIMATION IN PROPORTIONAL 
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Fan, Gijbels and King [Ann. Statist. 25 (1997) 1661-1690] con- 
sidered the estimation of the risk function ^{x) in the proportional 
hazards model. Their proposed estimator is based on integrating the 
estimated derivative function obtained through a local version of the 
partial likelihood. They proved the large sample properties of the 
derivative function, but the large sample properties of the estimator 
for the risk function itself were not established. In this paper, we 
consider direct estimation of the relative risk function tli{x2) — "tpixi) 
for any location normalization point xi. The main novelty in our 
approach is that we select observations in shrinking neighborhoods 
of both xi and X2 when constructing a local version of the partial 
likelihood, whereas Fan, Gijbels and King [Ann. Statist. 25 (1997) 
1661-1690] only concentrated on a single neighborhood, resulting in 
the cancellation of the risk function in the local likelihood function. 
The asymptotic properties of our estimator are rigorously established 
and the variance of the estimator is easily estimated. The idea behind 
our approach is extended to estimate the differences between groups. 
A simulation study is carried out. 

1. Introduction. The Cox proportional hazards model is by far the most 
popular model in survival analysis. In the Cox model, the conditional hazard 
rate of a survival time, T, given the regressor vector, X = x, is modeled as 

(1.1) X{t\x) = Xo{t)exp{iP{x)}, 

where Ao(i) is the baseline hazard function and is the risk function 

which measures the contribution of X at x. Typically, the baseline hazard 
function is left unspecified, while the risk function is specified paramet- 
rically as i/^iX) = X"^ for a vector of coefficients (3^, where ^ denotes the 
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transpose of the vector. In practice, survival data are often censored due to 
termination of the study or early withdrawal from the study. We can thus 
observe only Y = min{T, C}, where C is the censoring time independent of T 
given X. In addition, we also observe the censoring indicator, 5 = I{T < C}, 
as well as the covariate vector X . Let {(Xi,Yi,5i), i = 1, . . . ,n} represent an 
i.i.d. sample from the population (X, min(T, C), 7{T < C}), < ■ ■ ■ < 
denote the ordered observed failure times and (j) provide the label for the 
item failing at t^. Define Rj as the risk set at time : Rj = {i:Yi> t^}. Cox 
[5] suggested that estimation and inference on be based on the partial 
likelihood function 

N 



m = n 



exp(Xg)/?) 

With flexible specification for the baseline hazard function. Cox's model and 
the partial likelihood approach provide a very convenient way to measure the 
covariate effects on the survival time. See [7, 16, 19] and [21] for references 
on this model. However, the parametric specification of the risk factor -0(3;) 
is assumed largely for convenience. In general, misspecification of the risk 
function will lead to inconsistent estimation and misleading statistical infer- 
ences. Therefore, it is desirable to relax the parametric specification of the 
risk function. When the risk function ^/'(x) is not parametrically specified, 
the partial likelihood function is of the form 

/=iEigi?,exp(^(Xi)) 

Note that the function ip{x) is only estimable up to a location normaliza- 
tion. In fact, in Cox's original setup with the linear specification for the risk 
function, no intercept is allowed in /?. The relative risk provides all the infor- 
mation regarding the contribution of the covariates based on the Cox propor- 
tional hazards model. Recently, Fan, Gijbels and King [12] and Tibshirani 
and Hastie [23] considered local versions of the partial likelihood approach. 
They compared the relative risks for observations whose corresponding co- 
variate values belong to a single shrinking neighborhood. Obviously, these 
observations all have the same risk factor to the first order. Namely, the first 
term of the Taylor expansion of V'(^i) will be the same for Xi close to x. 
As a result, their estimation is based on the second-order comparison of the 
relative risk factors, leading to the estimators for the derivatives of the func- 
tion ip{x) only. To recover the original risk function, they suggested using 
the expression 'ip{x) = jQip'{t)dt by replacing ij^'i^) with their local partial 
likelihood estimates ip'it) for t S (0,x). Note that V'(O) = is implicitly im- 
posed for identifiability of However, the large sample property of the 
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estimator jQij)'{t)dt has not been formally established; thus, formal statis- 
tical inference is not feasible. Furthermore, their derivative-based approach 
cannot be extended to the case in which the covariate variable is discrete, 
since '(/'(•) is canceled out in the local partial likelihood, as pointed out in 
Section 2. 

In this paper, we consider the direct estimation of the relative risk func- 
tion il^{x2) — 'ip{xi) through a new version of the local partial likelihood. 
Intuitively, in constructing our local partial likelihood, we use observations 
in the neighborhoods of either xi or X2, which have risk factors different 
to the first order, thus enabling direct estimation and inference of the rel- 
ative risk 'ijj{x2) — 'ip{xi). Moreover, when the covariate variable is discrete, 
our approach reduces to the partial likelihood estimator for a two-sample 
comparison of survival times in the form of the usual proportional hazards 
model. In other words, our procedure reduces to an efficient estimation in 
the case of discrete covariates. Thus, we can expect our procedure to have 
high efficiency, at least when the data for the covariate variable X are not 
evenly distributed. 

Our procedure can be easily adapted to estimate differences in risk func- 
tions at any point x between two different groups by constructing our local 
partial likelihood using observations in the neighborhood of x for the two 
groups under consideration. We apply our procedure to the PBC data and 
find no treatment difference, which is consistent with findings from paramet- 
ric hazard regression. In an attempt to compare treatment differences, Fan 
and Gijbels [11] suggested estimating ^p'{x) for the two treatments separately 
and then integrating the derivative functions to recover the two risk func- 
tions. However, in this way each risk function is only estimable to within 
a constant. As a consequence, the treatment difference is not directly es- 
timable. Fan and Gijbels [11] thus imposed zero risk for both treatments at 
the left endpoint of the support of x. Our numerical analysis found that this 
assumption is inappropriate. 

There are some related studies on nonparametric regression techniques 
with censored data. Gentleman and Crowley [14] proposed an iterative al- 
gorithm to estimate with a uniform kernel function. Li and Doss [20]) 
investigated the nonparametric estimation of the conditional hazard and 
distribution functions using local linear fits. O 'Sullivan [22], Hastie and Tib- 
shirani [15] and Kooperberg, Stone and Truong [17, 18] used spline methods 
to study the model. 

This paper is organized as follows. Section 2 introduces our estimator and 
the idea is extended to the case when discrete covariates are also present. A 
numerical study is presented in Section 3 where we compare our procedure 
with that of [12]. We also apply our procedure to the PBC data. Section 4 
concludes the paper. 
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2. Local partial likelihood estimators. Recall that the partial likelihood 
function for model (1.1) is 



(2.1) 



m = n 



exp(V>(X(j))) 



for an i.i.d. sample. For notational simplicity, we assume that X is a con- 
tinuously distributed random variable. We discuss cases with discrete and 
multivariate covariates later. 

Suppose now that the form of -0(2;) is not specified and that the pth-order 
derivative exists at x. Then a local model [11] of tl^(X) can be expressed as 



(2.2) 



i;{X) ^ V(a^) + i^'{x){X -x) + 



pi 



by Taylor expansion for X in a neighborhood of x. Namely, ipiX) ~ X'^P 
for X close to x, where X = {1,X — x, . . . ,{X — x)^} and /? = (/3o, . . . ,l3p)'^ = 

{il:{x),ip' (x), . . . , ^^-^j^}'^. Let K he a kernel function that smoothly down- 
weighs the contribution of remote data points, let h be the bandwidth 
parameter that controls the size of the local neighborhood and let Xi = 
{1, Xi — X, . . . , (Xi — xY}^ for i = 1, . . . , n. Fan, Gijbels and King [12] con- 
sidered nonparametric estimation based on a local partial likelihood function 



N 



(2.3) J^K.iX^j) 



V^(X(,)) - log eMi^{Xi))Kh{Xi 



Using the local model (2.2), Fan, Gijbels and King [12] proposed to estimate 
the f3* defined below with the likelihood function 



N 



(2.4) 



x) 



^1)1^ - log E ^MXImh{X^ - X) 



. i&R, 



N 

-T.Kh{X(,) 



- log eMXfP*)Kh{X, - x) 



ieRi 



where Kh{t) = K{t/h)/h, 



r = (/?!,..., /3p)^ and X: = {X,-x,...,iX,-xf} 

Note, however, that the function value ^lJ{x) is not directly estimable, as 
(2.4) does not involve the intercept Po = V'(^)5 which has been canceled out. 
Tibshirani and Hastie [23] considered a similar approach using a nearest 
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neighborhood method. Furthermore, if X is a discrete random variable tak- 
ing on a finite number of values, a window around a value x only contains 
that value itself. Therefore, (2.3) reduces to 

N 



^/^(x) - log< ^ eyiY){tp{x))I{Xi = x) 



N 



\og\j2liX, 



which no longer depends on ^(•). This approach is thus not applicable to 
the case of discrete covariates. 



2.1. Estimation of the relative risk function. We consider direct estima- 
tion of ip{x2) — ip{xi) for a normalization point xi and any other point X2 in 
the domain of x. By including observations in the neighborhoods of either 
xi or X2, we consider the following local partial likelihood which essentially 
replaces Kh{Xi — x) in (2.3) with [Kh{Xi — xi) + Kh{Xi — X2)]'. 

N 

Ln = E[^^(^(i) ~ + ^^h{X(^j) - X2)] 

(2.5) 

V'(X(,)) - log] exp{iP{Xi))[Kh{Xi - xi) + Kh{Xi - X2)] 

Let a = 'ip{x2) — i^ixi) and, for I = 1,2, 

Xt, = {Xi-xi,..., {Xi - xiYY for i = 1,2, ... , 

(^xi — iPxifli Pxi,li • • • ) Pxi,p) 



'il)(xi),'>p'{xi),.... 



i;(p\xi) 



{Pxi,0,Pxi 



*T\T 



Using the local models in the neighborhoods of xi and X2, we obtain an 
approximation of L„, 



(2.6) 
where 



Ln — Lnl + Ln2, 



N 



Lnl = E ^h{X{j) - Xl)X^j'^Pxi 



N 



J2 KhiX^j) - xi) log Yl [eMXlPx,)Kh{Xi - xi] 
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+ exp{XlP^,)Kh{X,-X2)]\ 

^'■'^ . 

i=i 

^ f 

- E ^hiX^j) - ^i) log E [exp(Xif JK,(X, - xi) 
i=i Ueflj 

+ exp(a + Xaf - X2)] | 

and, similarly, 

Ln2 = E ^/^(^O-) - + ^2*5)/?:,) 

^ r 

(2.8) - ^ - X2) log ^ [exp(Xif /3*Ji^^X, - x^) 

j=l iieRj 

+ exp(a + X^lp:,)Kh (X, - X2)] | . 

Clearly, our formulation of the local partial likelihood will allow direct esti- 
mation of a = ip{x2) — ipixi), in contrast to [12] and [23]. 

In principle, one can estimate ip{x2) — ip{xi) by finding the value that 
maximizes (2.6). Intuitively, however, observations in the neighborhood of 
xi would not be informative about the derivatives of at X2, or vice versa. 
Therefore, a one-step estimator, which simultaneously estimates (a, f3*^ , P*^) 
through maximization of (2.6), is not particularly appealing. Instead, we 
adopt the following two-step strategy. In the first step, we adopt the ap- 
proach of [12] to obtain the estimates iPxi^Px2) {PxijPx2)- second 
step, we propose to estimate a by d, which maximizes 

(2.9) Z„ (a, , /3* J = Z„i (a, , /3* J + L„2 (aJ^^J*^). 

It is also worth noting that this approach is computationally more attractive. 
In addition, we find from our simulation that the performance of the two- 
step estimator is more stable than that of the one-step estimator. 

Note that, from the construction of our estimator, only observations in the 
neighborhoods of either xi or X2 will affect the estimation of tp{x2) — '4>{xi). 
The approach of [12], on the other hand, is cumulative in nature, in the sense 
that estimating ip{x2) — V'(^i) requires the estimation of ■^'(x) for x in the 
interval between xi and X2; consequently, a likely drawback of their approach 
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is that inaccurate estimation of ^lJ'{x) for x in a neighborhood belonging 
to [xi,j;2] will adversely affect the precision in estimating ^^(3^2) — '^'i^i), 
which utilizes the estimates of tp'{x) for all of the x G [xi,X2]- Indeed, this 
observation is confirmed in our simulation study presented in Section 3.1. 
We now consider the asymptotic property of our estimator. Set 

S{v\x) = P(Y>v\x). 

We impose the following conditions: 

1. the kernel function is a bounded symmetric density function with compact 
support; 

2. the function ip{-) has a continuous {pi + l)st derivative around xi and 

X2; 

3. the density /(•) of X is continuous at xi and X2; 

4. the conditional probability ^(fl-) is equicontinuous at xi and X2; 

5. the local bandwidths h and hi satisfy h/hi — > 0, nh ^ 00; nh^^^^ and 

are both bounded, where pi>p and hi are, respectively, the de- 
gree of polynomial and the bandwidth used for estimating the derivatives 
of [12]. 

The choice of bandwidths deserves some attention here. It is clear from 
(A. 8) in the Appendix that when h/hi is not bounded, the asymptotic nor- 
mality of a cannot be achieved. When h/hi is bounded but does not ap- 
proach as n— > 00, the asymptotic distribution of the first-step estimator 
will affect the distribution of a. When h/hi — > 0, which we impose here, 
the asymptotic distribution of the first-step estimator will have no impact 
on the second-step estimator, which makes the expression of the asymptotic 
variance much simpler than without the condition. 

Our main result is stated in the following theorem. 



Theorem 1. Under conditions 1-5, we have 

(2.10) ' ' ^ 

where 



bn{xi,X2) 



nh{a -a- bn{xi,X2)) N{0,a'^{xi,X2)), 
hP+^ 



(p + 1)! 
a'^{xi,X2) = I / K'^{u)du 



{i^^P-^^\x2)-i}^^^^\xi)][ uP+^K{u)du] and 



a:E^{v) + ax^{v) 



with a.^{v)=e^^'''^f{x)P{v\x). 
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The only unknown term of the bias, 'il^^~^^\x), can be easily estimated 
from our first-step estimator of the derivatives if we choose pi > p. To esti- 
mate (7^, we observe that if we know tp, then 

n 

a,,iv)=n-^J2^hiX^-xi)Yiiv)expii;{X,)) 
1=1 

will have converged to ax^iv) for 1 = 1,2 and all v. In addition, the baseline 
hazard function Ao(t) can be estimated by the Breslow estimator [3, 4], 

Note that since itself is not estimable, ax{-) or Ao(-) is only estimable 
up to scale. However, we can express /q°° dA(){v) as 

_i " (E.gi?, Kt,{X, - xi) exp( A))(E.gii, Kj^jX, - X2) exp( A)) 
fr{ ' (E.gi?, exp( A))(E.eij, (KhiX, - xi) + - xs)) exp(A)) ' 

where A = V'(A^i) ~ i^ixi) is already estimated with our methodology. Ob- 
viously, we can estimate a'^{xi,X2) by a'^{xi,X2), where 

a^{xi,X2) 

K^Mdu 



X 1 E E ^hiXi - xi) exp(A) 
I j=i \ie_R.j / 



^ KhiXi-X2)exp{D,)\ 
(^exp(A)) 

X ( ^ {Kh{Xi - xi) + - X2)) exp(A 



-In -1 



with A an estimate of A- 

The theoretical optimal bandwidth can be obtained by minimizing the 
asymptotic weighted mean integrated squared error 

[{&„(xi,X2)}^ + o^{x\,X2)\w{x\)w{x2) dx\ dx2, 
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resulting in 
^opt = Co^p{K) 



■Xo{v) dv 



axi{v) + ax2{v 
X w{xi)w{x2) dxi dX2 

X ■w{xi)w{x2) dxi dX2 



l/(2p+3) 



n 



-l/(2p+3) 



where Co,p(i^) are constants depending on p and K. The value of Cq^p{K) is 
tabulated in Table 3.2 of [11]. For a detailed discussion of the issue of model 
complexity, see [11]. 

Remark 1. Note that when X is a discrete random variable, our local 
partial likelihood (2.5) reduces to 



N 



- (JiQ) + J2(j))log<^ X (JiiexpV'(xi) + J2^exp{^p{x2))) 



= Y1 '^20)a-('^i(i) + '^2(i))log< Xl('^i« + '^2iexp(a)) I , 
i=iL lieRj ). 

where Ju = I{Xi = xi),Ji(^j-j = I{X(j-^ = xi) for / = 1,2, and a = tp{x2) — 
^l^{xl). This is, in fact, the partial likelihood estimator for a two-sample 
comparison of survival time in the form of the proportional hazards model. 
In other words, our procedure yields an efficient estimation for the case of 
discrete covariates since it is well known that the partial likelihood estimator 
is efficient ([8] and [2]). 

Remark 2. Similar to the partial likelihood approach ([3] and [6]) and 
the local partial likelihood approach of [12] and [23], our version of the local 
partial likelihood function can also be viewed as a local profile likelihood. 
Analogous to [12], the local likelihood in our setting can be written as 



(2.11) 



logL = J2 M^og XoiZi) + iPiXi)} - Ao(Zi) eMHXi))] 



X iKhiXi-xi)+KhiXi-X2)). 
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Consider nonparametric modeling for Ao(-), which has a jump of Xj at tj, 
Ao(t, A) = Ef=i < t}- Then 

TV 

AoiZi,X) = Y,XjI{ieRj}. 
i=i 

Substituting these two expressions into the local likelihood expression (2.11), 
we obtain 

TV 

logL = Y.{Kh{X, - xi) + Kh{X, - X2))[logA,- + V-l^O))] 
(2.12) 

n TV 

-T.T.iKh{Xi - xi)+KhiX, - X2))XjI{i G Rj}exp{^l^iXi)). 
i=ij=i 

Then, by maximizing logL with respect to Xj {j = 1, . . . ,N), we have 

c ^ Kh{X^ - x^) + KhjX^ - X2) 

' EiG«, {KhiXi - xi) + Kf,iXi - X2))expWXi)) ' 

Substituting Xj into (2.12) yields 
max log L 

N 

= Y.iKh{Xi - xi) + Kh{Xi - X2)) 
i=i 

(2.13) X |^(X(,)) - log {Kh{Xi - xi)+Kh{Xi - X2))eM^{X,))\ 

TV 

+ Y.iKh{Xi - xi) + Kh{Xi - X2)) 
i=i 

X {log[i^;,(Xi - Xi) + Kh{Xi - X2)] - 1}. 

Clearly, maximizing (2.13) is equivalent to maximizing (2.5). 

Remark 3. One referee pointed out that when estimating iIj{x2) — 
ipixi), an alternative approach would be to estimate ip{x2) — ipixs) and 
tp{x^) — ip{xi) separately and then to combine them for any point X3 be- 
tween xi and X2- In general, these two approaches will lead to different 
estimates. Theoretical justification could be based on two separate asymp- 
totic linear representations, as in (A. 8). In our simulation experiment (not 
reported here), these two approaches seem to be comparable. 
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Remark 4. One of the main advantages of the proportional hazards 
model is that it can easily accommodate time-varying covariates. The time- 
varying covariates can be incorporated into our approach in a straightfor- 
ward way by expressing the local partial likelihood through the counting 
process representation. As our paper is largely based on its comparison with 
[12], we chose to use notation similar to that used in [12], in order to facilitate 
comparison. 

2.2. Estimating the differences between groups. Our approach can be eas- 
ily modified to accommodate more realistic situations in which estimating 
the differences between groups is necessary. Let the risk function be il^{x, z), 
where x is continuous and z is discrete, taking two values. We focus on es- 
timating 'i/;(x,Z2) — ipix, zi). Similar to (2.5), we include observations in the 
neighborhoods of {x,zi) and {x,Z2)- Our local version of the log-likelihood 
becomes 



Using polynomial approximation in the neighborhood of x, we obtain the 
following approximation to L„: 



TV 



Ln = Y.^hiX(^j) - x){I{Z(^j) = Zi) + I{Z(^j) = Z2)) 



X - log Yl ^MHXi, Zi))Kh{Xi - x) 





Ln{p,l3l,f3*2) 



N 



J2 KhiX^j) - x)[h^,){Mx) + X^pl) + l2U){M^) + X*J)I3*2)] 



i=i 



JV 



- Y Kh{X(j) - ^)(^i(i) + Hi)) 




(2.14) 




N 



J2 Kh{Xij) - x)[h^,)X*J)l3l + /2(,)(p + X*J)(3*2)] 
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N 

X log] ^ Kh{X, - x) [lu eMXfPl) + hi exp(p + Xfp*2)] ] , 

where p = ip2i-c) — 'ipi{x), I/^i = I{Zi = z^) for /c = 1, 2, i = 1,2, . . . ,n and 
^fc(x)=^(x,zfc), P* = {i;',{x),^'^{x)/2,...,^'^^\x)/p) foTk = l,2, 
X* = {{X, - x), (Xi-xf,..., {Xi - xY) for i = 1, 2, . . . , n. 

By following an argument similar to the one in the previous section, we also 
adopt a two-step strategy here. In the first step, we apply the procedure of 
[12] to estimate fi^ and /Jg using observations in the neighborhoods of x for 
the two groups separately. In the second step, we estimate p by maximizing 
Ln{p,(3i,(32)- We now present the following theorem. 

Theorem 2. Let p he the maximizer of Ln{p, PI, ^2)- Under conditions 
1-5 of Section 2.1, with conditions 2-4 modified slightly so that they hold at 
point X, we have 



(2.15) ^nh{p -p- hn{x)) ^ N{0, a'{x)), 

where 

f{x) V Plx P2x 

with pkx = P{Z = Zk\X = x) and, for k= 1,2, 

exp(V'fc(x)) / P{Y > v\X = X, Z = Zk)Xo{v) dv 



4 



= [E{6\X = x,Z = Zk}]-\ 

Similar to the estimation of the bias term for Theorem 1, the only un- 
known terms ip^^^\x) for A; = 1, 2 are already obtained during the first step 
when we estimate the derivatives using [12] if we choose pi > p. To estimate 
the variance term, we note that ^27=1 Iki^i^hiXi — x) is an unbiased es- 
timator of Pkxf{x)/o'1.{x) for A; = 1, 2. Natural candidates for the estimation 
of bias and variance are, respectively. 
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(2.16) <t2(x) 



K'^{u)du 



+ 



1 



n-^Y:Uh^S^Kh{Xi-x), 

As a consequence of (2.15), the theoretical optimal bandwidth minimizes 
the asymptotic weighted mean integrated squared error, 

2 



(x) - iP^f^^\x))(^J uP+^K{u)du 
^ UK\u)du) f al{x) ^ al{x) 



nhf{x) 



Plx 



P2x 



w{x) dx, 



with some weight function w > 0. We find that the asymptotically optimal 
constant bandwidth is given by 

hopt = Co^p{K) 

- J{l/f{x)){aUx)/pi. + al{x)/p2.)w{x)dx ] '/('^+'\_y^,p^,) 



/(V'?+^^(x) - Vr^'^(rE))2u;(x) 
with Cq^p{K) being the same as in Section 2.1. 



(p+i)/ 



Remark 5 . Although the risk functions for each group can only be esti- 
mated up to a constant, the difference between two groups can be identified. 
Based on this observation, if we are interested in estimating risk functions 
for k groups, we need to impose only one condition, such as ipi{0) = 0, for 
identifiability. On the other hand, Fan, Gijbels and King [12] needed to es- 
timate the risk functions for each group separately. Therefore, k conditions, 
ipi{0) = for Z = 1, 2, . . . , fc, should be imposed for identification. Sometimes, 
this can be inappropriate, as in the case of analyzing PBC data, to be dis- 
cussed in the next section. 



3. Numerical studies. Extensive numerical studies were conducted to 
evaluate the new procedures and we found that the finite sample perfor- 
mance of our procedure is either comparable or better than that of [12]. The 
Epanechnikov kernel is employed in all of the simulation studies, as well as 
for the analysis of the real data set. 

3.1. Simulation studies on estimating relative risk. We compare the two 
procedures for the following three designs with different risk functions or 
distributions of the covariate variable X: 



• Design 1: X Uniform(— 1, 1), 'ip{x) = x^. 
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Table 1 
Mean integrated squared errors 



hi 


censoring 


Design 1 


Desif 


;n 2 


Design 3 


[12] 


new 


[12] 


new 


[12] 


new 


0.15 


0% 


0.181 


0.143 


0.324 


0.195 


0.262 


0.153 




30% 


0.271 


0.212 


0.384 


0.259 


0.353 


0.205 


0.25 


0% 


0.091 


0.084 


0.165 


0.140 


0.141 


0.088 




30% 


0.129 


0.119 


0.213 


0.176 


0.197 


0.118 


0.35 


0% 


0.062 


0.062 


0.152 


0.148 


0.104 


0.068 




30% 


0.084 


0.085 


0.182 


0.171 


0.155 


0.093 



• Design 2: X ~ Uniform(-l, 1), VC^^) = x"^ + exp(-150(x + O.S)^) + 
exp(-150(x- 0.3)2). 

• Design 3: 'ip{x) = x^. Half of the X are from A^(— 0.6, 0.3^) truncated at 
— 1 and 0, the other half from A^(0.6,0.32) truncated at and 1. Note the 
sparsity of data in the neighborhood of 0. 

The survival time is set to be exp(— -(/'(X) +e), where e is from the stan- 
dard extreme-value distribution. This is justified by the well-known result 
on the equivalence of the proportional hazards model (1.1) to the transfor- 
mation model: logAo(T) = —tp{X) +e, where e is a standard extreme-value 
random variable. In addition, the censoring variable is assumed to be uni- 
form on (0,c), where c is chosen for a prespecified censoring proportion 
(viz., 0% and 30%). For each set of c and tp, we simulate 500 realizations of 
{{Xi,Yi, 6i),i = 1,2,..., 300}. Let h^, /12 and /13 denote the bandwidths used 
for [12] for designs 1, 2 and 3, respectively; we tried three different values for 
hi, namely, 0.15,0.25 and 0.35, for which their approach yields reasonable 
estimates. Some adjustments were made in choosing /ig and h'^, due to some 
unique features involving ip{-) or the distribution of X. We set /ig = when 
\x\ > 0.5 and /i^ = 0.8/i? when |x| < 0.5. The reason for a smaller bandwidth 
for < 0.5 is similar to the idea of variable bandwidth [10]. For design 3, we 
set /13 = hi when \x\ > 0.2 and /13 = 2hi when \x\ < 0.2. The doubling of the 
bandwidth in the neighborhood of zero ensures enough data in that neigh- 
borhood. The bandwidths for our procedure are always set to be ^ = 0.8h^ 
for h^ = hi, /ig and /13 for the three designs. The mean integrated squared 
error (MISE) of the function estimations of ^(•) — ip{0) for designs 1 and 2 
and of 7p{-) — ^(—0.6) for design 3 are reported in Table 1. From the table, 
we find that our procedure is slightly better than [12] for design 1 and has 
better performance for designs 2 and 3. 

We consider now the biases of the estimates. The function estimates are 
shown in Figures 1-3 (FGK are function estimates by Fan, Gijbels and King 
[12]) for the case of hi = 0.25 and 30% censoring; similar results are observed 
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for hi = 0.15 and 0.35 and are thus omitted here for brevity. Figures 1 and 
2 and other unreported results suggest that the biases of the two procedures 
for designs 1 and 2 are comparable. For design 2, both procedures have some 
biases in the neighborhoods of ±0.3 due to the peaks of at these two 
points. Figure 3 suggests obvious bias in [12] when x > —0.2 and also some 
bias in our procedure in the neighborhood of 0. Note that for design 3, poor 
performance in the neighborhood of is to be expected, due to the sparsity 
of data. 

The mean squared errors at various points for 30% censoring are reported 
in Table 2 to facilitate a more detailed comparison between the two ap- 
proaches. We now take a close look at Table 2 to understand better the 
advantages of our procedure for designs 2 and 3. We find that the two pro- 
cedures are largely comparable in terms of mean squared errors at all the 
points selected for design 1. For design 2, we observe a similar pattern for 
\x\ < 0.4. For \x\ > 0.4, our procedure performs better than [12]. For design 
3, the performance of the two procedures is comparable for x < 0.2, and our 
procedure outperforms [12] when x > 0.2. 

The better performance of our procedure for designs 2 and 3 in certain 
regions is largely due to the way the two estimators are constructed. In the 
case of design 2, due to the difficulty in estimating in the neighborhoods 
of the two peaks at x = ±0.3, both procedures are not expected to estimate 
the function well in the neighborhoods of these two points. But, for [12], the 
estimates of ip{x) —-0(0) based on J^ip'{u)du will be adversely affected if 
0.3 or —0.3 lies between and x. On the other hand, our procedure performs 
well, provided ip'{-) can be estimated well in the neighborhoods of and x. 
Similar arguments also apply to design 3, for which is not expected to be 
estimated well in the neighborhoods of due to the sparsity of data, whereas 
tp'{x) can be better estimated when x is close to 0.6 or —0.6. This explains 
the obvious advantage of our procedure in the estimation of ip{x) — tp(—0.6) 




1 



~1 



-1 



-0.5 





X value 



0.5 



1 



Fig. 1. Design 1. 
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Fig. 2. Design 2. 



in the neighborhood of x = 0.6 and a relatively poor performance of [12] for 
ah X > 0. 



3.2. Application of estimation of the treatment effect. We apply our pro- 
cedure to the estimation of the treatment effect in an analysis of the Primary 
Biliary Cirrhosis (PBC) data set. Our procedure offers a natural approach 
to this particular problem. Basically, we estimate the treatment effect using 
data with bilirubin values in a neighborhood of each point of estimation. A 
detailed description of the PBC data can be found in Chapter 4 of [13]. A 
total of 312 patients participated in the randomized trial. Of the randomized 
patients, 187 cases (60%) were censored. 

Fan and Gijbels [11] investigated the effect of treatment differences by 
dividing the data into two groups according to the treatment code. For each 
treatment group, model (2.4) was fitted using log(Bilirubin) as a covariate. 
The resulting curves are reproduced in Figure 4 for the sake of comparison. It 
is worth pointing out that, with [12], treatment differences can be estimated 




Fig. 3. Design 3. 



LOCAL HAZARDS REGRESSION 



17 



Table 2 

Mean squared errors with 30% censoring 



-0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 



Design 1 

ft? =0.15 [12] 0.745 0.410 0.363 0.350 0.338 0.000 0.341 0.349 0.370 0.359 0.602 

new 0.652 0.363 0.327 0.338 0.292 0.000 0.338 0.334 0.348 0.328 0.473 

/i?=0.25 [12] 0.489 0.272 0.255 0.252 0.198 0.000 0.207 0.264 0.253 0.255 0.425 

new 0.471 0.262 0.251 0.255 0.197 0.000 0.215 0.261 0.248 0.245 0.380 

ft? =0.35 [12] 0.406 0.241 0.210 0.186 0.121 0.000 0.125 0.192 0.200 0.216 0.356 

new 0.424 0.244 0.211 0.192 0.106 0.000 0.132 0.196 0.201 0.214 0.348 

Design 2 

ft? =0.15 [12] 0.837 0.484 0.448 0.451 0.418 0.000 0.403 0.429 0.430 0.421 0.650 

new 0.681 0.366 0.328 0.382 0.421 0.000 0.386 0.379 0.333 0.312 0.487 

/i? = 0.25 [12] 0.529 0.332 0.321 0.305 0.279 0.000 0.277 0.311 0.307 0.312 0.438 

new 0.496 0.269 0.263 0.288 0.284 0.000 0.271 0.302 0.257 0.265 0.388 

ft? =0.35 [12] 0.424 0.273 0.243 0.232 0.176 0.000 0.188 0.249 0.242 0.243 0.387 

new 0.401 0.258 0.224 0.237 0.181 0.000 0.192 0.257 0.216 0.222 0.365 

Design 3 

ft? = 0.15 [12] 0.758 0.302 0.000 0.298 0.388 0.401 0.366 0.502 0.467 0.488 0.707 

new 0.620 0.308 0.000 0.296 0.267 0.300 0.390 0.292 0.265 0.300 0.511 

ft? =0.25 [12] 0.490 0.194 0.000 0.178 0.295 0.311 0.288 0.383 0.381 0.372 0.527 

new 0.489 0.198 0.000 0.180 0.170 0.201 0.295 0.232 0.209 0.218 0.371 

hi = 0.35 [12] 0.424 0.149 0.000 0.137 0.249 0.268 0.270 0.330 0.371 0.382 0.461 

new 0.444 0.159 0.000 0.140 0.117 0.144 0.242 0.206 0.189 0.206 0.314 



only up to a constant. Figure 4 implicitly assumes that the risk functions of 
the two treatment groups are equal at the left endpoint of the support of the 
covariate. There is no justification for this assumption since, while the risk 
functions themselves are not identifiable, the difference can be estimated 
following our approach. Based on Figure 4, Fan and Gijbels [11] suggested 
that the treatment effect is present. 

Following [11], we take the time (in days) between registration and death, 
or the time to being censored (liver transplantation or alive at study analysis) 
as the response and the natural logarithm of Serum Bilirubin (in mg/dl) as 
the continuous covariate. We use the local partial likelihood method (2.14) 
with p = 1. The derivatives PI and ^^e estimated separately using the 
approach of [12] with pi = 2 and the bandwidth hi = 1.2, which is the same 
as the bandwidth used to produce Figure 5.9 of [11]. For our second-step 
estimator, the bandwidth is chosen to be h = 0.8hi. Our 95% confidence 
interval for p is constructed as 



p-btix)±^-\l-a/2)a{x), 
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2 

log{Bilirubin) 

Fig. 4. Estimates by Fan, Farmen and Gijbels. 




Estimate 
95% C.I. 



2 4 

log{Bilirubin) 

Fig. 5. Estimates and 95% C.l.s. 



where 6in(y) and a{x) are defined in (2.16) and 



'ln\ 



bin{y)Kh{y - x)dx 



is a local weighted average of estimated bias. The main reason for the average 
is to stabilize the bias function, which involves an estimation of a higher- 
order derivative curve, from abrupt change. The idea of smoothing was also 
adopted by Fan, Farmen and Gijbels [9]. The results are shown in Figure 5. 

Figure 5 shows that, contrary to the findings of [11], the treatment effect 
is not present. The only sign of a treatment effect is in the range of negative 
covariate values. However, a close inspection of the data set reveals that 
the estimation for negative covariates is very unreliable since the censoring 
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percentage is very high, at 84%, while the censoring percentage for positive 
covariates is 48%. 



4. Conclusion. In this paper, we have considered direct estimation of 
the relative risk function in the proportional hazards model through a new 
version of the local partial likelihood. Our procedure was extended to the 
case where discrete covariates such as treatment group indicators are also 
present. We found that, in estimating the relative risk function, our proce- 
dure is either comparable to or outperforms the estimator proposed by Fan, 
Gijbels and King [12]. We applied our procedure to estimating the treatment 
effect in the PBC data and found that, consistent with findings from para- 
metric analysis, a treatment effect is not present, contrary to the findings 
by Fan and Gijbels [11]. 

APPENDIX: PROOFS 



Proof of Theorem 1. Set 

u* = {ujU^ , . . . ,uP}^ , 1^1= I u*K{u)du, 

(A.O) Hl = dmg{hi,hl...,h{), 

a,{v)=e'^^^^f{x)S{v\x), j{v) 



a^^i (u) -I- aj,2 (d) ' 

c^.(t) = f a.{v)\o{v)dv, Kit) = r^aM^^aM^AoWd^. 
Jo Jo cixAv) + ax2{v) 

We first prove consistency. Define 

(A.l) Niit)=I{Y,<t,6, = l} and m) = I{Yi>t}. 

Let the filtration J^nt be the statistical information accruing during the time 
[0,t], namely 

J^nt = (T{Xi,Ni{v),Yi{v+),i = l,...,n,0<v<t}. 
Then, under the independent censoring scheme, 

(A.2) Mi{t) = N,{t)- f'Y,{v)exp{ij{X,))Xo{v)dv 

Jo 

is an .F„t-martingale. 

By (2.6)-(2.8), 9o = (a — a) maximizes ln{Oo, (^1,(^2), where 
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and 

/•CO " ^ 

ln{0O, 01,02) = / Y.i^h{X^ - Xi){X*Jp*^ + t/^f ^i) 

■^0 j=l 

+ - X2){X*Jp*^ + C/lf 02 + 0o)} dNi{v) 

log{n5.„,o(6'o, ^1, ^2, ^')}?^~^ 

n 

X Y.{Kh{Xi - xi) + Kh{Xi - X2)] dN,{v). 
1=1 

Here, 

TT* rj* — 1 TT^ 7_r* — 1 

n 

Sn,oi0o,0i,e2,v) = ^ y,(7;)[exp(X*f /?*^ + f/*f - xi) 

j=i 

+ exp(a + ^o) 

X eMX^TP:, + U^Te2)Kh{Xi -X2)\. 
With a slight abuse of notation, let 9q maximize ln{do,di,02,T), where 

+ Kh{Xi - X2){X^fK, + Ui[e2 + 0o)} dNi{v) 

{A.3) 

log{nSnfiiOo,0i,92,v)}n~^ 

n 

X ^{K/.(X, - m) + i^;,(X, - X2)) dN.iv). 
i=l 

Our case corresponds to that of r = oo. 

Since 9i ^ 0,02 ^ 0, similar to (6.26) in [12], we can show that, for any 

ln{eo,Ol,02,T)=ln{Oo,O,O,T) + Op{l). 

Let 

n 

Skniv) = n-^ Y^i^) exp(V(X,))Kh(Xi - Xk) for k = l,2. 
i=i 

Then 

/„(0o, 0, 0, r) - /„(0, 0, 0, r) = A„(0o, 0, 0, r) + X„(0o, 0, 0, r). 
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where 

An{9o,0,0,T) = r S2n{v)eoXo{u)du 

Jo 

n 



lo "I 5n,o(0,0,0,t;) 
XnieoA 0,r) = l\-'J2 ^hiXi -X2)eo dM,{v) 



i=l 



5„,o(0, 0,0,1;) 

n 

X Y,{Kh{X^-xi) + Kh{X^-X2)}dM^{v). 

i=l 

The process X„(0O)O,O,-) is a locahy integrable martingale with a pre- 
dictable variation process 

Bn{t) = (X„(^o,O,O,t),X„(0o,O,O,t)) 



E 







KhiXi -X2)eo 

ij^ (V \ ^ V (V ^^l 'S'n,i(6'o,0,0,u)^ "'^ 
-{Kh[Xi-xi)^Kh[Xi-X2))\o'g- 



5„,o(0, 0,0,1;) 
X yi('t;) exp(V'(Xi))Ao(i;) dv. 
By Lemma 1 of [12], it can be shown that 

EXl{QQ,^,^,t) = EBn{t) = 0{rr^h'^) for < t < r, 

and 

5'n,o(6'o, 0, 0, v) = S(v\xx)l{x\) + exp(a + 6'o)5'(i;|x2)/(2;2) + Op(l) 
= exp(-V'(xi))[aa.i(i;) + exp(0o)ax2(^')] +Op(l), 
5'in(t') =0x1(1;) + Op(l), 52„(i') =ax2(^) +Op(l). 

Therefore, 

A(0o, 0, 0, r) = ^(^0, 0, 0, r) + Op(l), 

where 

^(eo,0,0,r) = Q\^^{v)Xoiv)dvyo 

(v) + exp 
flxilv) +ax2iv) 



log<j , ^ }[ax^{v) + ax2{v))Xo{v)dv. 
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Consequently, 

IniOo, 01,02, t) = Ai0o, 0, 0, r) + Op(l). 

It can be easily shown that ^(^05 0,0,r) is a strictly concave function with 
a unique maximum at = 0- As lni0o,0i,02,T) is a concave function of 0q, 
by the convexity lemma [1], 0. 

We now consider asymptotic normality. Note that 

dln{0O,01,02,r) 







800 



dln{0,0,0,r) , dXi0o,0u02,r) ^ 
^^■^^ = Wo + dfi ^° 

dX{0o,0i,02,T) ^ dHn{hAA,r) ^ 
d0od0i 800 802 ^' 

where 0^ lies on the line segment from to 0^ for k = 0,1,2. By making use 
of Lemma 1 from [12], it is straightforward (although tedious) to show that 

8'^lni0o,0i,02,r) , , , , 



80i 

&%{0o,0i,02,r) 



(A.5) deodej =^W+"^W' 

8'^ln{0O,01,02,T) / \ T , /IN 

80o80l =-<-^^^ 

where k{t) and vi are defined in (A.O). Furthermore, we prove at the end 
of the Appendix that 

g^n(0,0,0,r) ^ ^ ^ , 

800 

where 

Uon{r) = n"^y2 / \Kh(,X,-X2) 

(A.7) - |i:liMlM[i^,(x, - x^) + K,{X, - X2)]]dM,{v), 

Sn,o{0,0,0,v) J 

foon(r) = l^^^{i;(P+')ix2)-^l^^^'-'\x^)}^J uP^'K{u)dny 
Here, 

Sn,l{0O, 01,02,v) 
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n 

7i-i J2 ^^(^) e^P("° + ^o) expiX*fpf^ + Uge2)Kh{Xi - X2). 



It follows from (A.4)-(A.6) that 

^O = (l + Op(l)) 



(A.8) 



t^On(-r) ^ bonir) 



k(t) k{t) 
+ iiyi + Op{l)f 01-92) +Op{hP+^) 



By applying the martingale property [13], we can easily prove that 
(A.9) VnhUon{T)^N(^0,ii{T) J K^{u)duy 



In addition, it follows from Theorem 4 of [12] that ^JnhiOi = Op{\) for I = 
1,2. We conclude that 

(A.IO) Vnh9i = ^j^y^Oi = Op(l), for / = 1, 2, 

since h/hi ^ by Condition 5 of Theorem 1. Finally, from (A.7)-(A.10), 
we have 

where 

To finish the proof of Theorem 1, it suffices to prove (A. 6). □ 

Proof of (A. 6). By taking the derivative with respect to in (A-l), 
we obtain 

dln{0,OAr) 

^ = Uon[T) + Bon[T), 

OUq 

where 

n .r ( 

Uon{r)=n~'Y. / \KhiXi-X2) 

Sn,o{0,0,0,v) J 
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and 



50n(r)=?l"^V / \Kh{Xi-X2) 

- ^'!nnn1 [^^(^- " + ^^^^^ " ^^)]| 
6.„,o(0,0,0,u) J 



X Yi{u) exp{ij{Xi))\o{u) du 
= Soin(r) +So2„,(t). 

Here 

~{Jo 6„,o(0, 0,0,1;) 

and 

B02n{T) = J2 r ^h{X^ - X2) 

7~, JO 



S„,o(0,0,0,u) J 



Define 

" S„,i(0,0,0,i;) 



-iV^/V^ir 0,0,0,^ 
^"'0 ^„,o(U,U,U,'i 



X Yi{v) expCV-Cxi) + Xif /3*0)Ao(i;) dv, 



I 5„,o(0,0,0,i;) J 



X Yi{v) exp(V^(rE2) + X*^^ (5Z)^o{v) dv. 
Similar to the proof of (6.24) in [12], and by the fact that 
5„,i(0,0,0,u) exp(aO)/(x2)P(t;|x2) 



5„,o(0, 0, 0, u) f{xi)P{u\xi) + exp(aO)/(x2)P(t;|x2) 
a^^iv) 



ax^{v) + ax^{v) 

we have 



:7(7;) 



i?oin(r) - So^n(r) = -"^^^^^^f^^xir) j u^^'K{u) du + o,{h^+^), 
Bo2n{r) - B^,,^{r) = ^^^^^iM^^(^) J u^+^K{u) du + o,{hV+^). 
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Furthermore, 



^-1 J2 r Y,{v)Kh{X, - X2) exp(^(x2) + X^lpf^)\oiv) dv 
i=i •'^ 

-5„,i(0,0,0,?;) _i 

■ — —71 

5„,o (0,0,0,?;) 

n 

X J2Yi{v){Kh{Xi - xi)exp(V^(xi) 

i=l 

+ KhiXi - X2) exp(V'(x2) + m[ill)]\^{v) dv 
exp('(/'(3;i ) ) I ^ Sn,i (0, 0, 0, u) Ao {u) du 
'■-5„,i(0,0,0,?i) 



/ C n n ^'S'n.oCO, 0, 0, u) Aq (u) du ^ = 0. 

/o 6n,o(0,0,0,li) J 

We conclude that 

Bonir) = (Bomir) - i?o^n(^)) + (^02n(r) - B',,^{t)) = 6on(r) + Op(/iP+^), 
where 6on(''") is defined m (A. 7). □ 

Proof of Theorem 2. Define 

A{t,x)= / S'('y|x)Ao(v) c^i;. 



(A.ll) ^'^.expC^^Cx)) 



Pix.exp('0i(x)) +P2xexp{%l^2{x))' 
pix exp{7pi {x))p2x exp{tp2{x)) 



pi^exp('0i(x)) +P2xexp{'il^2{x))' 

The proof of Theorem 2 is similar to that of Theorem 1. Therefore, we will 
only provide the main steps that differ from those in the proof of Theorem 
1. Denote i) = (%, 7?2) with m = p - p^.fik = Hf{Pl - (3f) for A; = 1,2. 
Then, maximizing L„ of Section 2.2 is equivalent to maximizing Z„ (77,00), 
where 



•^0 i=i 

x[Iu{XfPl + U*^r^,] 



(A.12) 

+ I2^{p + % + Xf(3*2 + Ufm)] dNi{v) 
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n 



'^Y.Kh{X^- X)ih, + I2^) log{nSniv, v)}dNiiv). 



1=1 



Here, iVj(t) [and later Mi{t)] are defined in (A.l) and (A. 2) in the proof of 
Theorem 1, 

U* = {HIY^X*, with Hi defined in (A.O), and 

n 

Sniv, v) = J2 y^iv)Kh{Xi - x) [lu ^MXffil + Ufm) 



i=l 



+ hi exp(p + r?o + Xfl3l + Uf'm)] 



By an argument similar to that in the proof of Theorem 1, and by noting 
that 

5„(?/o,0,0,7;) 

^pix exp('0i(x)) +P2a:exp('02(2;) +T/o)exp(-V'i(x))/(x)S'(-y|3;), 
we can prove that 

in{'no,'iii,m,T) - z„(o,o,o,r) 

= {pix exp{ipi{x)) + p2x exp{'4)2{x))f{x)A{T,x)) 

>la;exp('(/'i(x)) +P2a;exp(V'2(2;) +%) 



+ Op(l). 



pix ex.p{ilJi{x)) + p2xexp{ip2{x)) 

Obviously, the right-hand side of the above equation is a strictly concave 
function of tjq. Since ^n(%, i") is a concave function of ryo, by the con- 

vexity lemma, fj 0. 

Next, we prove asymptotic normality. Note that 

dln{r],T) 







(A.13) 



dr]o 



+ 



-m + 



m + 



T V2, 



dm d7]Q ' dr]odr]( ' d^drj^ 

where r/ lies on the line segment between and fj. Using arguments similar 

— P 

to those in the proof of Theorem 1, we can show that for any ^ 0, 



(A.14) 



dm drj^ 



-K(x)/(x)A(r,x)+Op(l), 



k(x)/(x)A(t, x)iyf + Op(l), 



-K(x)/(x)A(r,x)z^f + Op(l), 
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n(0.7 

dr)0 



with k{x) and A(r, x) defined in (A. 11). can be expressed as 



(A.15) ^hflll = D^r) + hn{r) 

where 



Dn{r) = r n-^y^^h{X^ - x){h^ - (hi + hMv)} dMi{v), 
^0 i=l 

pT ^ 

bmir) = J n^^Y. ^h{Xi - x){h^ - {hi + hi)qn{v)} 



'0 i=i 

X Yi{v)ex:£>{'^{Xi,Zi))\Q{v)dv, 

with 

Er=i Yi{v)Kh{Xi - x)h^ exp(pO + r?o + Xf^f 



Qniv) 



Sn{0,v) 

By Taylor expanding at {x,zi) and (x,2;2); 

hi{exptl;{Xi,Z,)-exp{M^)+Xf(3l)) 



ip+l) 



/2.exp(^2(x))^|^4^(X, - xr+' + Op{hP+'), 



we obtain for bin^r) 

bin{r) = j^{4'^'\x)-^^r'\^)) 

(A.16) 

X k{x)f{x)A{T,x)(^J uP+^K{u)du^ +Op(/iP+i). 

Furthermore, since 

{VnhDn{T), VnhDn{T)) 

= ^ rilKliX, - x){-hiqn{v) +hi{l-qn{v))f 
•^0 i=l 

X Yi{v)eyip{'ilj{Xi,Zi))\Q{v)dv 



h rj^KliX, - x){Iuq'n{v)+hr{l - qn{v)?} 
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X Yi{v) ex.p{Tp{Xi,Zi))Xo{v) dv 
^ (pia;exp(V'i(x))t^(2;) +p2a;exp(V'2(a:))(l - i^ix))"^) 
X f{x)A{T,x)(^J K'^{u)du^ 

= K{x)f{x)A{T,x)(^J K^{u)dv}j ^a^{T,x), 

a straightforward application of the martingale central limit theorem results 
in 

(A.17) V^Dn{T)^N{0,a^iT,x)). 
From (A.13)-(A.17) and under the condition h/hi — > 0, 

V^ip -p- bnir)) 4 Ar(0, a(r)2), 

where 

UK^{u)du) 



a{T) 



k{x)f{x)A{T,x)' 



□ 
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